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Abstract — In this paper, motivated by network inference and 
tomography applications, we study the problem of compressive 
sensing for sparse signal vectors over graphs. In particular, we are 
interested in recovering sparse vectors representing the properties 
of the edges from a graph. Unlike existing compressive sensing 
results, the collective additive measurements we are allowed to 
take must follow connected paths over the underlying graph. For 
a sufficiently connected graph with n nodes, it is shown that, 
using 0(fclog(ra)) path measurements, we are able to recover 
any fc-sparse link vector (with no more than k nonzero elements), 
even though the measurements have to follow the graph path 
constraints. We further show that the computationally efficient 
li minimization can provide theoretical guarantees for inferring 
such fc-sparse vectors with 0(fc log(n)) path measurements from 
the graph. 

I. Introduction 

In operations of communication networks, we are often 
interested in inferring and monitoring the network performance 
characteristics, such as delay and packet loss rate, associated 
with each link. However, making direct measurements and 
monitoring for each link can be costly and operationally diffi- 
cult, often requiring the participation from routers or potentially 
unreliable middle network nodes. Sometimes the responses 
from the middle network nodes are unavailable due to physical 
or protocol constraints. This raises the question of whether it 
is possible to quickly infer and monitor the network link char- 
acteristics from indirect end-to-end (aggregate) measurements. 
The problem falls in the area of network tomography, which is 
useful for network traffic engineering [26 1 and fault diagnosis 
[ 17 1 [ 18 1 [22 1 [24 1. Because of its importance in practice, network 
tomography has seen a surge in excellent research activities 
performed from different angles, for example, [3|[7|[9|[13| 
H5l lUHl JT9l lEQl EH |I221 • In this paper, we propose to study 
the basic network tomography problem from the angle of 
"compressive sensing", which aims to recover parsimonious 
signals from underdetermined or incomplete observations. 

Compressive sensing is a new paradigm in signal processing 
theory, which challenges to sample and recover parsimonious 
signals efficiently. It has seen quick acceptance in such appli- 
cations as seismology, error correction and medical imaging 
since the breakthrough works [4|[5|[6|[ 12 1, although its role in 
networking is still limited iPTOl ifTTI |[T6l ll26l . Its basic idea is that 
if an object being measured is well-approximated by a lower 
dimensional object (e.g., sparse vector, low-rank matrix, etc.) 
in an appropriate space, one can exploit this property to achieve 
perfect recovery of the object. Compressive sensing l4ll6l l[T2l 
characterizes this phenomenon for sparse signal vectors, and 
presents efficient signal recovery schemes, from a small number 
of measurements. Recent works have started to extend this 
framework to the efficient inferring of low -rank matrices ||5) . 



In this paper, we propose a compressive sensing approach for 
network (graph) tomography by exploiting the sparse signal 
structures therein. For example, it is very common that only 
a small fraction of network links are experiencing congestion 
or large packet loss rates. Compressive sensing appears to be 
the right tools to infer those sparse characteristics. However, 
many existing results of compressive sensing critically rely 
on assumptions that do not hold for network applications. For 
example, in network tomography, a measurement matrix is in 
a more restrictive class, taking only nonnegative integers while 
random Gaussian measurement matrices are commonly used in 
current compressive sensing literature. More importantly, as we 
will see, measurements are restricted by network topology and 
network operation constraints which are again absent in existing 
compressive sensing research. Overall, compressive sensing for 
network tomography, compared with other compressive sensing 
problems, is quite different and interesting in its own right 
because of its close connection to graphs. It is therefore not 
clear whether we have theoretical guarantees for recovering 
individual link characteristics using underdetermined observa- 
tions under graph topology constraints and if so, how to do it. 
This paper answers these two fundamental questions. 

More concretely, bridging the gap between compressive 
sensing and graph theory, we study compressive sensing over 
graphs. The signal vectors to be recovered are sparse vectors 
representing the link parameters of a graph. We are allowed 
to take measurements following paths (walks) over the graph. 
We have the following two main results: for a sufficiently 
connected graph with n nodes, even though under the graph 
path constraints, 

• 0(klog(n)) path measurements are sufficient for identi- 
fying any fc-sparse link vector (for example, identifying fc 
congested links) 

• l\ minimization has a theoretical guarantee of recovering 
any fc-sparse link vector with (3(fclog(n)) path measure- 
ments. 

The paper is organized as follows. In Section HU we give 
the problem formulation, explain the special properties of 
compressive sensing over graphs, and compare it with graph 
constrained group testing problems. In Section[III] we show that 
0(k log (n)) path measurements are sufficient for compressive 
sensing over graphs. In Section [IV] we show that l\ minimiza- 
tion can provably guarantee the performance of compressive 
sensing over graphs. Section [V]presents numerical examples to 
confirm our predictions. We conclude in Section [VI] 

II. Problem Formulation and Related Works 

We consider a network, represented by an undirected graph 
G = (V, E), where V is the vertex (or node) set with cardinality 



2 



Pi 




P2 

Fig. 1: A Network Example 



1^1 = n, and E is the edge (or link) set with cardinality 
\E\. Communications between vertices can only occur over 
these edges. Over each undirected edge between two vertices, 
communications can occur in both directions. Q We also assume 
that each communication route must be a connected path over 
this undirected graph. 

Suppose that we have probes along m source-destination 
pairs over a network (\E\ > m, otherwise the problem is not 
interesting). We are interested in identifying certain links from 
the probe measurements. For example, the congested links with 
large delays or high packet loss rates. We note that the delay 
over each source-destination pair is a sum of the delays over 
each edge on the route between this source-destination pair, 
giving a natural linear mixing of the link delays on the route. 
Abstractly, let x be an \E\ x 1 non-negative vector whose j-th 
element represents the delay (or — log(l — Pj), where Pj is the 
packet loss rate over link j) over edge j and let y be an m x 1 
dimensional vector whose i-th element is the end-to-end delay 
(or — log(l — P), where P is the packet loss rate for the whole 
path) measurement for the i-th source-destination pair. Then 

y = A Xl (l) 

where A is an m x \E\ matrix, whose element in the i-th row 
and j-th column is T if the i-th source-destination pair routes 
through the j-th link and '0' otherwise. For example, for a 
network with \E\ = 6 links and m — 4 paths in Figure [U the 
measurement matrix A is: 

/ 1 1 \ 

110 

10 10' *■ ; 

\ 1 1 1 J 

The question now is whether we can estimate the link 
vector x, using the path measurement y. Although \E\ > m 
means we only have an underdetermined system, it is still 
possible if we know x is a sparse vector, which in practice 
can often be a reasonable assumption. For example, there 
are only a small fraction of links that are congested, i.e., 
the link delays are considerably larger than the delays over 

'This undirected graph model has been used for communications networks 
such as optical networks [15j[24|. And Our work can also be extended to 
directed graph models. We also allow paths to visit an edge multiple times. 



other links. In other words, the vector x representing the 
delays over links is a spiky (or approximately sparse) vector. 
This provides the foundation to link our network tomography 
problems to compressive sensing. There are however important 
differences between network tomography problems and general 
compressive sensing formulation: 

• Because of making measurements over communications 
paths, the element Aij from A is either 0, when the 
measurement path i does not go through link j, or an 
integer b, when the measurement path i goes through link 
j for b > times. Generally, the number b is '1', which 
often makes the matrix a '0' and '1' matrix. 

• More importantly, besides being a '0-natural number' 
matrix, A also has to satisfy the path constraints over the 
graph. Namely, all the nonzero elements in row i of A 
must correspond to a connected path. Even for a complete 
graph in Figure Q] a row from A can not take the form 
(0, 0, 0, 0, 1, 1). This is because no path can only transverse 
link 5 and link 6. 

• In many cases, the sparse link vectors we are interested 
in are nonnegative vectors. For instance, the delay vectors 
and the inverse logarithm of the packet loss rate vector. 

Finally, we want to compare our study with a closely 
related topic, graph-constrained group testing (TJ, (8), iTPfl . 
lfT31l . f2l\. Compressive sensing over graphs involves y which 
can take values over real numbers, instead of 'true-or-false' 
binary values for the group testing problems. The measurement 
result y is the additive linear mixing of the vector x over 
real numbers, in contrary to the logic OR operation for group 
testing problems. Consider a simple example, if the delay vector 
x for the network in Figure Q]is (2, 3, 0,0, 0,0) T , then in 
compressive sensing, y = (5,0,3,0) T ; while in group testing, 
y = (Y,N,Y,N), where Y and N represent "congested" and 
"not congested" respectively. From compressive sensing, by a 
simple checking, we know x = (2, 3, 0, 0, 0, 0) T is the only 
sparsest solution that satisfies y = Ax; however, group testing 
will decide that x = (N, Y,N,N,N,N) T . But in fact, there 
is no 1-sparse x that can generate such a y = (5,0,3,0) T . 
This hints that compressive sensing can do better than group 
testing in terms of needed measurements which will be further 
quantified in table I. 



III. When is Compressive Sensing over Graphs 
Possible? 

In this section, we focus on the question that how many path 
observations will suffice to recover any k network edge failure. 
First, in an order of more and more demanding requirements, 
we give three conditions on the measurement matrix A to 
guarantee recovering fc-sparse link vectors (Theorems [T] [2] and 
|3}. Then we show that a measurement matrix generated from 
random walks will be able to recover any fc-sparse vector using 
only 0(fclog(ra)) measurements. 



3 



A. Success Conditions for Compressive Sensing 

Theorem 1. Let y = Ax. Then if x is a nonnegative signal 
vector with no more than k nonzero elements, with 

fc< min max{L w ,fc+ w }, 

w£ J V( J 4),w^0 

where M(A) is the null space of A, fc_ iW and fc+. w are 
the number of negative and positive nonzero elements in the 
vector w, then any such nonnegative signal vector is the unique 
sparsest nonnegative vector satisfying y = Ax. Conversely, if 

k > min max{fc_ w , k + w }, 

w£Af(A),w/0 

then there exists a nonnegative k-sparse vector x such that it is 
not the unique sparsest nonnegative vector satisfying y = Ax. 

Proof: We first prove the forward direction. Indeed, any 
vector x satisfying y = Ax must be of the form x = x+w with 
w from the null space of the matrix A. If k < fc_. w , then the fc- 
sparse nonnegative vector x plus the vector w will have at least 
one negative element, which can not be a nonnegative solution 
to y = Ax. If instead k < fc+. w , then the fc-sparse nonnegative 
vector x plus the vector w will have at least fc+, w nonzero 
elements, which must have more than k nonzero elements. 

Now we only need to prove that we can always find a fc- 
sparse signal x with k > min wS ^(^) w ^ max{fc_ jW , fc +:W } 
such that x is not the unique sparsest solution satisfying y = 
Ax. We let w £ A/"(A) denote the nonzero vector minimizing 
max{L ;W , fc+, w }- 

In fact, if we take a vector x supported on the set K, 
with \K\ = k = max{fc_ jW , fc +jW }, A"_ jW C K and 
K C K- tVf 1J K + w , where if-. w is the index set for the 
negative elements of w and K + VJ is the index set for the 
positive elements of w. 

We let xa- = \wk\ (taking elementwise absolute value). 
Then obviously, x + w will be a k + w -sparse nonzero vector, 
and has no more than k nonzero elements. 

■ 

For comparison, we have a more stricter, but easier to use 
condition for recovering an arbitrary (not necessarily nonnega- 
tive) fc-sparse vector x. 

Theorem 2. Let y = Ax. If x is a signal vector with no more 
than k nonzero elements, where 

k < mm , 

weAf(A),w/o 2 

where ||w|| is the number of nonzero elements in the vector 
w, then x is the unique sparsest vector satisfying y = Ax. 
Conversely, if 

, ^ • ll w llo 
k > mm , 

wgjV(/l),w/0 2 

then there exists a k-sparse vector x such that it is not the 
unique sparsest vector satisfying y = Ax. 

Proof: Following the same line of proof in Theorem [T] ■ 
Based on the previous theorems, we can now give a stricter 
sufficient condition for recovering fc-sparse signal. 



Theorem 3. Suppose that for every no more than h columns, 
indexed by the set H C {1, 2, \E\}, of the m x \E\ 
measurement matrix A, the corresponding m x h submatrix 
Ah (consisting of these h columns of A) has at least one row, 
say row i, such that there is a single nonzero element in that 
row. Then any k-sparse signal vector x, with k < is the 

unique sparsest solution x to y = Ax. 

Proof: From Theorem [2] we only need to show that in 
the null space of A, every nonzero vector will have at least 
(h + 1) nonzero elements. In fact, suppose that there exists 
a nonzero vector w ^ from the null space of A, which 
has no more than h nonzero elements, and suppose that its 
support set is H. However, since there exists one row in Ah 
with a single nonzero element, Ah'Wh must be nonzero, which 
contradicts the fact that w is from the null space of A. So each 
nonzero vector in the null space of A has at least (h + 1) 
nonzero elements. From Theorem [2] every fc-sparse vector x, 
with fc < ^Ji, will be the unique sparsest solution to y = Ax. 

■ 

B. How Many Measurement Paths are Needed? 

Now we want to show that 0(fclog(n)) measurements are 
enough for recovering any fc-sparse link vector for a sufficiently 
connected graph with n nodes. 

1) Graph Assumptions: Before we proceed, following the 
works on graph-constrained group testing [15], [8], we intro- 
duce the following assumptions on the graphs. 

The undirected graph G — (V, E) is called a (D, c) uniform 
graph if for some constant c, the degree of each vertex v 6 V 
is between D and cD. Suppose that a standard random walk 
over the graph has a stationary distribution /i over the nodes. 
The 5-mixing time of G is defined as the smallest t' such that 
a random walk of length t' starting at any vertex in G ends up 
having a distribution p! such that — /i'||oo < &■ We define 
T(n) as the ^-mixing time of G for 8 — ^ 2 cn) 2 • 

2) 0(fclog(n)) measurements are sufficient: In compressive 
sensing, we adopt an m x \E\ measurement matrix generated 
by m independent random walks . For each random walk, we 
uniformly randomly pick a starting vertex from V and then 
perform a standard random walk over the graph. The length 
of the random walk is denoted by t. From QS], we have the 
following theorem, 

Theorem 4. ^ There is a degree D = 0(c 2 kT 2 (n)) and 
t = 0( e jfej^ n ) ) such that whenever D > Do, by setting the 
path lengths t = 0( gs^fen ) the following holds. Let B be a 
set of at most (fc — 1) edges in the graph G, and let e be an 
edge not belonging to the set B. Then 

where 7r e h is the probability that the random walk passes 
through link e, but misses all the edges from the set B. 

Now we take an arbitrary set of edges E' with cardinality 
\E'\ = fc. Let us take m independent measurements satisfying 
the graph path constraints. Then the probability that there does 
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not exist any measurement walk (each walk corresponds to 
a row of the measurement matrix A) with a single nonzero 
element in the columns corresponding to the edges from E' , 
can be expressed by 

P=(l-n E ,r, 

where ite> is the probability that a random walk visits one and 
only one element from the set E'. In fact, ite> = f2( e 4 fc TS( n ) ) x 
k since the events of having a single nonzero element can be 
divided into k disjoint events, each of which is the event that 
the single nonzero element appears in one of the k possible 
columns of E', 

Since there are ('^') ways of choosing the k edges, the 
probability that there exists one edge set E' of \E'\ = k without 
any single-nonzero-element row, is 



P, 



k,k 



< 



< 



< 



(i-n(- 



k 



c A kT 2 (n) 

fc(l+log(^))+m log(l-n( cAT \ [n) )) 



(3) 
(4) 
(5) 



So if 



namely 



o fc(l+Iog(^))+m log(l-0( aiT \ (n) )) 



< 1, 



m > 



fc(l + log(^)) 

bgCi-nc^^))' 



the probability Pk t k will be smaller than 1, 

Now let us look at a set E" with cardinality 
smaller than k. We notice that n e b = ^{ c 4kT 2 (n) - 



E"\ = ki 
is true for 



any edge e and any set B of cardinality no bigger than k. So 
the probability tte" that a random walk visits edge e (and only 
visits that edge e) from the set E" is ir et E"\e = ^i c^kT^tn) )• 
Again we take m independent random walk measurements 
satisfying the graph constraints. Then the probability that there 
does not exist any measurement having one and only one 
nonzero element in the columns corresponding to the edge set 
E" is given by 



where 7Te'< 



P=(l-Tr E ») m , 

) since the events of having a unique 



nonzero element over ki different columns are disjoint events. 
Since there are ('^') ways of choosing the ki edges, the 



probability that there exists one edge set E' with \E'\ 
without any desired single-nonzero-element row is 



P, 



< 



< 



n2 \( Ct( ^1 

( c 4 fcT 2 (n) 



))' 



< e fc 1 (l+log(f r ))+m log(l-n( c4fc ^ (n) )) 



(6) 
(7) 
(8) 



m 


Compressive sensing 


Group Testing 


Graph constrained 


0(k log(n))(this paper) 


0(k A log(f )) '8' 


General 


0(fclog(^))6 


0(^log(f)) 14 



TABLE I: Number of measurements needed in different sce- 
narios 



So if 

fci(l 
namely 



lo §(f)) 



m > 



mlog(l 



ckT 2 (n) 
log(g)) 



))<0, 



Jog(i-n(?3^Hj))" 



So as long as m > maxi<fc 1 <fc— - 



fei(l+log(fr)) 



, with 

probability 1 — o(l), the measurement matrix A guarantees 
recovering up to -|-sparse link vectors (from Theorem [3j. In 
fact, m = 0(c A T 2 (n)k\og(n)) measurement paths suffice. 

The following table provides a summary of results for 
number of measurements needed in graph constrained problems 
or general problems without graph constraints. 

IV. l\ Minimization Decoding 

lx minimization has been a popular efficient decoding 
method for inferring x from compressed measurements y = Ax. 
ED, lfl2l . l\ minimization solves for min||x||i subject to the 
constraint y = Ax. However, it is not clear how one can 
efficiently infer these sparse vectors over graphs. In this section, 
we show that when the number of measurement paths is 
m = 0(fc log(n)), ti minimization can recover any fc-sparse 
link vector efficiently. We will consider the matrix A generated 
by regularized random walks with "good starts'". Our proof 
strategy is to show that under the same graph assumptions as 
in last section, A corresponds to a bipartite expander graph 
with high probability. Then we use the expansion property to 
show the null space property of A guarantees the success of 
l\ minimization. Theorem [5] states that if the random walk 
ever visits a small edge set, very likely it visits this set a small 
number of times. Based on Theorem[5] Theorem|6]and Theorem 
[7] assert that A corresponds to an bipartite expander graph. 
Theorem [S] and Lemma [3] give further regularity properties of 
A. Finally, Theorem [9] shows how expansion property implies 
that ti minimization succeeds in recovering sparse vectors. 

We first give the definitions about "good start" random walks, 
measurement matrix A constructed from regularized random 
walks, bipartite graphs corresponding to A and some basic 
assumptions about the graph we are considering. 

Definition 1 ("good start" random walk). A random walk with 
a "good start" chooses the starting vertex with a probability 
proportional to its degree and then performs a random walk of 
length t over the graph. Namely, the probability that the random 
walk starts with the vertex i with probability 2pJ|> where di is 
the degree of vertex i and \E\ is the total number of edges in 
the graph. 
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Fig. 2: A Bipartite Graph Representation for A 



Definition 2 (matrix A from regularized random walks). Sup- 
pose W2 is a walk on an undirected graph G — (V, E). Then 
a regularized walk W\ adapted from W% is a walk which visits 
the same set of edges as Wi does, but visits each such edge 
no more than twice. We will use the regularized walks adapted 
from "good start" random walks to construct the rows of A. 

From Lemma [3j we can always get a regularized walk from 
a given walk. However, using regularized walks, the maximum 
element in A is upper bounded by 2. 

Definition 3 (bipartite graph from an m x \E\ matrix ^4). 
We construct a bipartite graph by placing \E\ "edge" nodes 
on the left-hand side and m "measurement" nodes on the 
righthand side. An "edge" node j on the left is connected 
to a "measurement" node i on the right if and only if the 
i-th random walk goes through edge j. For < e < 1, a 
bipartite graph is called a (k,e) expander if every set of left 
nodes S, with cardinality \S\ < k, are connected to at least 
(1 — e)\E(S)\ righthand side nodes (namely the neighbors of 
S, denoted by N(S)), where E(S) is the set of links that go 
from S to the righthand side. In other words, \E(S) \ is the total 
number of nonzero elements in the columns corresponding to 
S in A, \N(S)\ is the number of nonzero rows in the submatrix 
As and N(S) > (l — e)\E(S)\. d m i n and d max are respectively 
the smallest and largest degrees of the left-hand "edge" nodes 
in the bipartite graph. 

For example, Figure |2] is the corresponding bipartite graph 
for matrix A in (0. 

In this section, we set t = 0( ^ ) and also assume the 
mixing time T(n) has an upper bound as n grows, which 
will simplify the presentation of our analysis. However, our 
results still extend to the case of growing T(n) by setting 
t = 0( rp^ k ) and m = 0(T(n) 2 klog(n)). We also assume 
that the smallest degree D in the graph grows with n. 

To prove the expansion property for A, for an arbitrary edge 
set S with \S\ = k, we bound the conditional probability that 
a random walk visits another edge in S after it has already 
visited one edge from 5. 

Theorem 5. Let P>i,s be the probability that a "good start" 
random walk ever visits an edge from an edge set S with \S\ = 
k. Let P>2,S be the probability that such a random walk visits at 
least two edges from S. Then we can always select the random 
walk length in such a way that t = O(^) and P>2,s < 



rjP>\^s, namely the conditional probability 

P{the random walk visits > 1 edges in S\ (9) 
a random walk visits at least 1 edge in S) 

< V, 

where < i] < 1 is a constant which can be made arbitrarily 
close to 0. Similarly, for any 1 < k' < k, P>(k'+i),s ^ 
r]P>k',s, where P>k>,s (P>(k'+i),s) ' s tne probability that the 
random walk visits at least k' (k' + 1) edges from S, and 77 is 
the same rj as in (O. 

Proof: Suppose that the random walk ever visits one or 
more edges from the set S and suppose the first edge from S 
the random walk visits is edge i 6 S, visited between time 
indices j — 1 and j, where 1 < j < t. By denoting the two 
vertices connected by edge i as u^.i and Vi.2, we also assume 
that at time index j, the random walks is at the Z-th (/ = 1 or 2) 
vertex, denoted by tii i, of edge i. We denote the probability of 
this event by Pi. Vi u j, (i £ S), and further denote by P>2\i, Vi t ,j 
(i G S) the conditional probability that the random walk visits 
another edge from S conditioned on this event (the random 
walk visits i 6 S first between time index j — 1 and j, and sits 
at vertex v^i at time index j). 

Since the probability P>i,s that the random walk visits at 
least one edge in S can be decomposed as 

2 

■tes j 1=1 

we have 

D _ SieS J2l=l Pi,Vi, 1,3 x P>2\i,v iA ,3 
^>2|>1,S " ^ 2 p . 

where -P>2|>i,s is the conditional probability that the random 
walk visits at least one more edge in S after already visiting 
one edge in S. 

Now if we can show the conditional probability P>2\i, Vi u j 
is small enough for every possible i, j and I, we will get 
the conclusion in the theorem. By the Markov property of the 
defined random walk, P>2\i. Vi ,.j (i S S) is upper bounded by 
the conditional probability that the random walk visits at least 
one edge of S after time index j, conditioned on that the walk 
sits at the vertex Vij at time index j. 

So we only need to show that 

P(the random walk visits S again after time index j 
the random walk is at vertex v^i at time index j), 

is small enough or can be made arbitrarily close to if we 
choose the length of the random walk appropriately. Before we 
proceed to upper bound this probability, we present the fol- 
lowing lemma about the conditional probability that a random 
walk visits a certain edge after the mixing time T(n). 

Lemma 1. For any vertex Vij and any time index j, if z > T(n) 
(the 5-mixing time), the conditional probability Pj+ Z<e i v . u j that 
the random walk visits one certain edge e between time index 
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j + z and j + z + 1 is between -rg- 



D Je] ' D' 
D is the smallest degree for the vertices in the graph. 



Proof: At time index j + z, no matter what vertex the 
random walk is at time index j, by the definition of mixing 
time, the random walk will visit the two^ vertices that define 
edge e with probabilities in the regions [^fer — S, ^kjr + S] and 

d 2 d 2 

t^&J — ^' 5T&T + "1 respectively. So between time index j + z 
and j + z + 1, the probability that the random walk visits edge 
e will be lower bounded by 



where which can be further upper bounded by 

kT(n 



d 1 1 d 2 



1 5 



> 



2\E\ 
1 26 
~E\ ~ Z?' 



1 



(10) 



and similarly, we have the upper bound. ■ 
Building on Lemma [T] to get the probability that the random 
walk visits another edge from S conditioned on the fact it sits 
at node v^i at time j, we divide the random walk after time 
index j into T(n) edge chains /i,/2,/3, ■••> frfn) constructed 
in the following way. The s-th chain f s+ \ (0 < s < T(n) — 1) 
starts from the edge traversed by the random walk between 
time indices j + s and j + s + 1. Then the s-th chain will 
include sequentially the edges traversed by the random walk 
between time index pairs (j + s + T(n), j + s + 1 + T{n)), 
(j + s + 2T(n), j + s + l + 2T(n)), until the random walk 
ends. Namely, we sample the random walk (after time j) with 
a period of T(n) with T(n) different starting phases. 

Without loss of generality, we look at a chain f s +\. At time 
index j + s, the conditional probability (conditioned on the 
fact the random walk is at vertex Vij at time index j) that the 
next edge traversed by / s+ i is from S is at most because 
no matter what vertex the random walk reached at time index 
j + s, there are at least D edges connected to that vertex. Now 
we look at the probability P s that f s+ i does not traverse any 
edge from S after time index s + j + T(n) (conditioned on the 
fact the random walk is at vertex Vij at time index j). Since 
all the time indices are separated from each other and from 
vertex Vi : i by at least T(n) time slots, from the mixing time 

definition, P s is at least (l - ( + 
represents the ceiling operation. 

So the (conditional) probability that f s visits S after time 
index j is upper bounded by 



, where [• 



k ( , k 2kS, 



Using a union bound over the T(n) chains, the conditional 
probability that the random walk visits S again will be upper 
bounded by 



rV 



< 



D 

kT{n) 
D 



T{n) x 

(t + T(n))k 
\E\ 



k 2k5\ t 

W\ + ^~) x l W) 



2k(t + T(n))5 
D ' 



So we can always take t scaling as O(^) to make this 
probability arbitrarily small (of course k must also make the 
first and third term small enough, which is easily true based on 
the assumptions on D, T(n) and 5 = ^cn)' 1 )• 

Moreover, using the same set of arguments, we can extend 
this conclusion to any 1 < k' < k. ■ 

Theorem 6. With t = O(lfi), for any arbitrary edge set S 
with cardinality k, if we take m = 0(T(n)klog(n)) "good 
start" random walks, then with probability 1 — 0(\E\~ k ), the 
number of walks that traverse at least one edge of S is g = 
0(fclog(n)); moreover, with probability 1 — 0{\E\~ k ), the total 
sum number of edges from S visited by the m random walks 
will be upper bounded by r = (1 + e') jz^:, where r\ is the 
conditional probability appearing in Theorem [5] and e' is an 
arbitrarily small number. 

Proof: We start by providing a lower bound on the 
probability that the random walk ever visits S. 

Lemma 2. The probability P>\ that a random walk of length 
t visits an edge set S of cardinality k will be ^( jv^jrgi )• 

Proof: We consider a chain of period T(n) and focus on 
the time slots starting with time index 0, T(n), 2T(n).... Note 
that at time index 0, the random walk has achieved its stationary 
distribution due to the manner by which we pick the starting 
vertex. Similar to the proof of Theorem [5] from the Markov 
property and the mixing time definition, the probability P>i 
that the random walk visits an edge in S is lower bounded by 



1-1- 



k 2/W xxL - 

> l^ e L^jJl°g(l-(^-^)) 



> 1 - e 



-L- 



>J(t 



= 



fi(- 



-J 



tk 



E\T{n)< 



Let Xi, 1 < i < 77i, be m independent Bernoulli random 
variables indicating whether the i-th random walk visits the set 
S, so each of them takes value T with probability P>i and 
takes value '0' with probability (1 - P>i). Let X = J^Li X t 
be the total number of walks that visit the set S. When 
t = O(^y-) and m = 0(T(n)klog(n)), the expected value 
of X is P>im — P>iO(T(n)k\og(n)). Now we show that 
the actual number of random walks that visit S concentrates 
around 6(fclog(n)). 

From a Chernoff bound on X, the probability that X > P'm 
when P' > P>i (or X < P'm when P' < P>i) is upper 
bounded by e -mDi//(-P'l|P>i) ) wher e Dif f{P'\\P> 1 ) is the 
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relative entropy 



So as long as 



m > 



(l-P')log 



fclog(|£|) 



1-P' 
1-P>1 



Diff(p>\\p> x y 

with probability 1 — 0(\E\~ k ), X will concentrate around its 
mean value toP>i (not going above or below mP'). If P' = 
(1 - e')P>i or P 1 = (1 + e')P>i for a sufficiently small e' > 0, 



e' 2 P>i 



1-P>1 

So from Lemma |2] when m = 0(T(n)k\og(n)), with proba- 
bility 1 — 0(|P|~ fc ), the number of non-all-zero rows will be 
<7 = e(fclog(n)). 

Now let Yi, 1 < i < to, be m independent random variables 
indicating how many edges from S the i-th random walk 
visits. Let Y = Y^iLi 5* be the total number of edges from 
S visited by m independent random walks. From Theorem 
[5] when m = 0(T(n)k log(n)), then the probability that 
Y > r = (1 + e')i^n w ^ be no digger than the probability 
Y' — YliLi Yl > r, where F/s are i.i.d. nonnegative integer- 
valued random variables and each of these m random variables 
takes value '0' with probability 1 — P>i, '1' with probability 
P>i(v~ 7 7 2 )> value '2' with probability P>i(tj 2 — rf),... and so 
on. So for each 1 < i < to, E(K/) = and E(Y') = 
For any e' > 0, by a standard Chernoff bound for Y', with 
rn = 0{T(n)k\og(n)), Y > (1 + e') 1 !^ 1 with probability 
at most 0(|P|~ fc ). (We however choose not to present the 
explicit large deviation exponent for Y in this paper due to 
its complicated expression.) ■ 

Since there are at most Of ') edge sets of cardinality k, by a 
union bound and Theorem [6] (where we replace n with \E\ < 
n 2 ), with probability 1 — o(l), for all the edge sets S with 
cardinality k, the number N(S) of random walks that visit 
S will be at least y^-|P(»5')|. Note that this corresponds to 
the expansion concept we mentioned at the beginning of this 
section. 

By repeating the previous arguments for smaller edge sets, 
we know with high probability, the expansion properties for 
all the edge sets with cardinality < k also hold when m = 
0(T(n)k log(n)). So in the end, we have the following theorem 
about expansion. 

Theorem 7. If t = 0(^-), then a measurement matrix 
generated by m = 0(T(n)k\og(n)) "good start" random 
walks with length t will be an (fc, 1 — tt~^) expander, where 
r) is the same r] appearing in Theorem]?] and e' > is any 
positive number independent of r\. 

Now we want to determine the large degree d max and the 
smallest degree d m i n for the bipartite expander. Note for edge 
e, the number of visiting random walks is equal to the degree 
of edge e's corresponding "edge" node in the bipartite graph. 
Theorem [8] bounds d max and d m t n . 



Theorem 8. Choose the random walk parameters appropri- 
ately. Then the probability that a random walk visits a certain 
edge e will be between 



Pm.in. 1 (1 



(T 1 H^l")- 1 



and 



\E\ D 
E\ D' 



For an arbitrary e' > 0, with probability 1 — o(l), the 
number of nonzero elements in every columns of A is be- 
tween (1 — d)P m i n m and (1 + e')P maK m, when we take 
m = 0{T{n)k\og{n)) random walks. 

Proof: First, we establish the lower bound. We focus on 
the time slots starting with time index 0, T(n), 2T(n),.... By 
the definition of mixing time, the probability that this sampled 
walk does not visit edge e is upper bounded by (1 — + 
2|)Lt(,i) J ; so we nave a corresponding lower bound 1 — (1 — 



1 i 25 V 

\E\ ~T D> 

For the 

fi, /2, /: 

probability that chain does not visit the edge e will be lower 
bounded by (1 - rL - ^) rT <^, and so the probability that 
the sampled walk visits edge e will be upper bounded by 



T( 

upper bound, we consider T(n) chains 
T(n)of period T(n). Then for each chain, the 



1-(1 



1 

W\ 



25, 
D 



^ tL)1 



By a union bound over the T(n) chains, the probability that 
the random walk ever visits edge e is upper bounded by 

1 2<S N r^ 

V 



P, 



T(n) 1 - (1 



When we take t — O(^), the lower bound and upper 
bound scale as 0( fcT ^ ra ^ ) and O(^) respectively. So by sim- 
ilar Chernoff bound arguments as in Theorem [6] if to = 
0(T(n)k log(n)), with high probability, simultaneously for all 
the columns, the number of non-all-zero elements concentrate 
between 0(log(n)) and 0(T(n) log(n)) respectively. ■ 

Lemma 3. Any walk W taken over an undirected graph can 
be converted to a walk that visits the same set of edges and 
visits each edge no more than twice. 

Proof: We induct on the number of nodes that the random 
walk visits. Apparently, for up to 2 nodes, this claim is true. We 
assume this claim is true for any walk that visits up to n nodes. 
If a random walk visits (n + 1) nodes, there must be a node N 
such that when is deleted from the walk, the remaining parts 
of the walk remain connected. In fact, take an arbitrary node i 
on the random walk, then all the other nodes are on a spanning 
tree whose root is node i. Then any leaf node of this tree can 
be deleted while all the remaining nodes remain connected. By 
the induction assumption, we know there exists a walk W' that 
visits each edge of the remaining n-node graph for at least once 
but for at most twice. Then we can construct another walk W" 
over the (n + l)-node network in the following way. We start 
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on walk W'. When walk W visits a node j that is connected 
to node N through an edge ei in the walk W, we will divert 
from node j via edge e\ to visit node TV and come back along 
the same edge to node j. From there, we continue in a similar 
fashion along the walk W' to complete constructing the new 
walk W", which visits every edge of W, but no more than 
twice. ■ 

Theorem 9. With probability 1 — o(l), l\ minimization can 
recover any Q{k)-sparse edge vector measured using matrix 
A generated from m = 0(T(n)k\og(n)) independent "good 
start" regularized random walks of length t = O(^). 

Proof: l\ minimization recovers every fc'-sparse vector 
if and only if every nonzero vector w G Af(A), ||wjf/||i < 
a||w||i for any edge index set K' with cardinality 0(fc), where 
a < i. By Theorem [7] the measurement matrix A generated 
by in — 0(T(n)k log(n)) "good start" random walks of length 
O(^y-) corresponds to a bipartite (fc,e) expander graph with 
high probability, where e > is a constant which can be 
made arbitrarily close to if we choose t and m appropriately. 
Now we show for such an A with expansion, the null space 
requirement for l\ success is satisfied for \K'\ = 0(fc). The 
proof in this lemma follows the same line of reasoning as in [2], 
except for taking care of the irregularities in uneven nonzero 
elements in A and unequal degrees for left-hand side nodes. 
Thus the readers are encouraged to see J2J for more detailed 
explanations. 

Let K' be the index set of largest elements (in amplitude) in 
a nonzero vector w E N(A), with cardinality \K'\ = k' < |. 
So they correspond to k' "edge" nodes in the bipartite graph 
representation for A. We first argue that 



|Ak-/w.k-/||i > (d„ 



Ad, n 



Wjr' 



(ID 



Let us imagine a bipartite graph for A, but with no links 
(between the lefthand nodes and righthand nodes) yet. Consider 
the following process of adding the links to the left-hand "edge" 
node set K' one by one. We start by adding the links to the 
lefthand "edge" node that corresponds to the largest element 
of w in amplitude, then the links corresponding to the second 
largest element of w in amplitude and so on. If a newly added 
link is connected to a righthand side "measurement" node that 
is already "plugged in" by some previously added links, we will 
call a "collision" occurs. If there were no "collisions" occurring, 
||^4ifW#-< ||i will be at least d m in ||w/f||i. By the expansion 
property of the bipartite graph, when we are done adding the 
links of the left hand node corresponding to the i-th (i < k) 
largest element of w in amplitude, at most ed m axi collisions 
occur. Since we already rank the elements of w in amplitude, 
by the triangular inequality of £i norm, these collisions will 
add up to at most 2ed ma x\\'WK' ||i (the term 2 comes from 
the fact that the elements in A are upper bounded by 2 via 
regularized random walks). This will result in a loss of at most 
Aed max \\wK> ||i in ||Ax/w#-< ||i by the triangular inequality for 
l\ norm, which leads to (fTTT i . 

Now we partition the index set {1, 2, \E\} into I subsets of 
size k' (except for the last subsect) in an decreasing order of w 



(in amplitudes), where I — \-jt-~\- Since Aw — 0, over the set 
N(S) of righthand "measurement" nodes that are connected 
to K', (K = K'), 



= \\A K >w K > + A Kl w Kl 
id 



Ak,W;_i||i 

i-1 



Aed. 



HKt.r 



> 



Ad 

■max e)||wK'||i - Aed 



d=l 

!.■■ ...E 1 1 W || 1 , 



where the first inequality is due to the (k, e) expansion property, 
(which results in at most 2d max k' link "collisions" between any 
set Ki and K' ) and the upper bound 2 for elements in A. Again, 
please refer to J3 for more explanations. So in summary, for 
any nonzero w £ Af(A), 



Wjfi 



< 



4ed„ 



l w Hl- 



As long as e ^ ma:c < i l x minimization can recover up to 
any /c'-sparse signal via 0(T(n)k log(n)) measurements, where 
k' < 4 (conditioned on expansion property for A by setting t 
and m appropriately, which is possible from Theorem [7] and 

B9. ■ 

V. Numerical Examples 

In this section, we will provide numerical simulation results 
demonstrating the performance of compressive sensing over 
graphs. In all the simulations, we generate the the measurement 
matrix A from independent random walks of certain lengths, 
subject to the graph topology constraints. 

Example 1 Figure [3] shows the recovery percentage of l\ 
minimization for fc-sparse edge signal vector over a complete 
graph with 50 vertices and 1225 edges. The k edges with 
nonzero elements are uniform randomly chosen among the 
1225 edges. For this example, we take m = 612 random 
walks of length t — 612 to collect 612 measurements. Two 
scenarios are considered. One is for the edge signal vectors 
with real-numbered nonzero Gaussian distributed elements, 
which can take positive and negative values. The other scenario 
is for vectors with nonnegative nonzero elements, for which 
we impose the nonnegative constraints in l\ minimization 
decoding. Saving 50 percent of measurements, l\ minimization 
can recover real-numbered sparse vectors with 17 percent 
nonzero elements or nonnegative sparse vectors with about 24 
percent nonzero elements, even under the graph constraints. 

Example 2 In this example, we consider a random graph 
model of 50 nodes, where there is an edge with probability 
p = 0.5 between any two nodes. So on average, we have 
around 600 edges in the final graph. We tested l\ decoding 
for real-numbered sparse signal recoveries in the same fashion 
as in Example 1. The length t of each random walk is set as 
one third of \E\. In Figure H] we plot the relationship between 
the number of measurements and the maximum recoverable 
sparsity k. A sparsity is deemed recoverable if 99 percent of 
fc-sparse vectors have been recovered in the experiment. 
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Fig. 3: n — 50 Complete Graph, with t = 612 and m = 612 
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Fig. 4: n = 50 Random Graph 



VI. Conclusion 



We study network tomography problems from the angle of 
compressive sensing. The unknown vectors to be recovered are 
sparse vectors representing certain parameters of the links over 
the graph. The collective additive measurements we are allowed 
to take must follow paths over the underlying graphs. For a suf- 
ficiently connected graph with n node, we find that 0(k log(n)) 
path measurements are enough to recover any sparse link vector 
with no more than k nonzero elements. We further demonstrate 
that l\ minimization can be used to recover such sparse vectors 
here with theoretical guarantee. Further research is needed to 
find efficient ways to construct measurement paths. In addition, 
it is also of interest to investigate the possibility of using 
nonlinear measurements and low-rank matrix recovery 1 25 1 1 26 1 . 
So far we have only studied compressive sensing over graphs 
for ideally sparse signals and extensions to noisy measurements 
are part of future work. It is also interesting to consider more 
efficient polynomial-time algorithms for compressive sensing 
over graphs l23l . 
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